Workshop Track -iclr 2017 Neural Combinatorial Optimization with Reinforcement Learning

نویسندگان

  • Irwan Bello
  • Hieu Pham
  • Quoc V. Le
  • Mohammad Norouzi
  • Samy Bengio
چکیده

We present a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method. Without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. These results, albeit still quite far from state-of-the-art, give insights into how neural networks can be used as a general tool for tackling combinatorial optimization problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-task learning with deep model based reinforcement learning

In recent years, model-free methods that use deep learning have achieved great success in many different reinforcement learning environments. Most successful approaches focus on solving a single task, while multi-task reinforcement learning remains an open problem. In this paper, we present a model based approach to deep reinforcement learning which we use to solve different tasks simultaneousl...

متن کامل

Neural Combinatorial Optimization with Reinforcement Learning

This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the rec...

متن کامل

Learning Evaluation Functions

Evaluation functions are an essential component of practical search algorithms for optimization, planning and control. Examples of such algorithms include hillclimb-ing, simulated annealing, best-rst search, A*, and alpha-beta. In all of these, the evaluation functions are typically built manually by domain experts, and may require considerable tweaking to work well. I will investigate the thes...

متن کامل

End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning

In this paper, we present a neural network based task-oriented dialogue system that can be optimized end-to-end with deep reinforcement learning (RL). The system is able to track dialogue state, interface with knowledge bases, and incorporate query results into agent’s responses to successfully complete task-oriented dialogues. dialogue policy learning is conducted with a hybrid supervised and ...

متن کامل

Salient Object Subitizing: Supplementary Material 1. Visualizing the CNN Subitizing Classifiers

[1] A. Karpathy. t-SNE visualization of CNN. http://cs.stanford.edu/people/karpathy/ cnnembed/. [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), 2012. [3] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017